Shannon index

The Shannon index, sometimes referred to as the Shannon-Wiener Index or the Shannon-Weaver Index,[1] is one of several diversity indices used to measure diversity in categorical data. It is simply the Information entropy of the distribution, treating species as symbols and their relative population sizes as the probability.

This article treats its use in the measurement of biodiversity. The advantage of this index is that it takes into account the number of species and the evenness of the species. The index is increased either by having additional unique species, or by having a greater species evenness.

The "Shannon-Weaver" name is a misnomer; apparently some biologists jumped to the conclusion that Warren Weaver, author of an influential preface to the book form[2] of Claude Shannon's 1948 paper[3] founding information theory, was a cofounder of this theory. Weaver did play a crucial role in the rapid postwar development of information theory in a different way, however; as an influential early administrator of the Rockefeller Foundation, he ensured that the first information theorists received generous research grants. Norbert Wiener had no hand in the index either, although his influential popularisation of cybernetics was often conflated with information theory in the 1950s.

Contents

Definitions

Interpreting the Index

Typically the value of the index ranges from 1.5 (low species richness and evenness) to 3.5 (high species evenness and richness),[4] though values beyond these limits may be encountered. Because the Shannon Index gives a measure of both species numbers and the evenness of their abundance, the resulting figure does not give an absolute description of a site's biodiversity. It is particularly useful when comparing similar ecosystems or habitats, as it can highlight one example being richer or more even than another. There is always the need to inspect the data or use another index to unpack the true reasons for the difference.

Computing the index

H^\prime = -\sum_{i=1}^S (p_i \ln p_i )

where S is the total number of species and p_i is the frequency of the ith species (the probability that any given individual belongs to the species, hence p).

It can be shown that for any given number of species, there is a maximum possible H^\prime, H_\max=\ln S which occurs when all species are present in equal numbers.

An alternative form is

H^\prime = -\sum_{i=1}^S (p_i \ln p_i )- [(S-1)/2N]

The second half of this version is a correction factor.

Proof that maximum evenness maximizes the index

The following will prove that any given population will have a maximum Shannon Index if and only if each species represented is composed of the same number of individuals.

Expanding the index:

H^\prime = -\sum_{i=1}^S {n_i\over N} \ln {n_i\over N}
N H^\prime = -\sum_{i=1}^S n_i \left ( \ln n_i - \ln N \right )
= -\sum_{i=1}^S n_i \ln n_i %2B \ln N \sum_{i=1}^S n_i
N H^\prime - N \ln N = -\sum_{i=1}^S n_i \ln n_i

Now, let's define H_s = -\sum_{i=1}^S n_i \ln n_i Clearly, since N is a positive constant for a given population size, and N\ln N is also a constant, then maximizing H_s is equivalent to maximizing H^\prime.

Strategy

Let's split an arbitrarily sized population into two groups, with each group receiving an arbitrary number of individuals and an arbitrary number of species. Now, within each group, each species has the same number of individuals as any other species in that group, but the number of individuals per species in the first group may be different from the number of individuals per species in the second group.

Now, if it can be proven that H_s is maximized when the number of individuals per species in the first group matches the number of individuals per species in the second group, then it has been proved that the population has a maximum index only when each species in the population is evenly represented. H_s doesn't depend on the total population. So H_s may be built by simply adding the indices of two sub-populations. Since the population size is arbitrary, this proves that if you have two species (the smallest number that can be considered two groups), their index is maximized if they are present in equal numbers. So the rules of mathematical induction have been satisfied.

Proof

Now, divide the species into two groups. Within each group, the population is evenly distributed among the species present.

H_s = -\sum_{i=1}^{S-p} {N-k \over S-p} \ln {N-k \over S-p} 
    - \sum_{i=1}^p {k\over p} \ln {k \over p}  
  = -\left ( N-k \right ) \ln  {N-k \over S-p}  
    - k \ln {k\over p}.

To find out which value of k will maximize H_s, we must find the value of k which satisfies the equation:

{d\over dk}\, H_s=0.

Differentiating,

\ln { N-k \over S-p} %2B (N-k){1 \over N-k} - \ln {k\over p} - k {1 \over k} = 0,
\ln {N-k\over S-p} = \ln {k \over p}

Exponentiating:

{N-k\over S-p} = {k \over p} = {pN \over S}.

Now by applying the definitions of N_{i1} and N_{i2}, we get

N_{i1} = N_{i2} = {N\over S}.

Result

Now we have accomplished the proof that the Shannon index is maximized when each species is present in equal numbers (see #Strategy). But what is the index in that case? Well, n_i = {N\over S}, so p_i = {1\over S} Therefore:

H_\max = - \sum_{i=1}^S {1\over S} \ln {1\over S} = \ln S.

References

  1. ^ Krebs, Charles (1989) Ecological Methodology. HarperCollins, New York.
  2. ^ Weaver, W.; C.E. Shannon (1949). The Mathematical Theory of Communication. Urbana, Illinois: University of Illinois. 
  3. ^ Shannon, C.E. (July and October 1948). "A mathematical theory of communication". Bell System Technical Journal 27: 379–423 and 623–656. 
  4. ^ McDonald, Glen (2003) Biogeography: Space, Time and Life, John Wiley & Sons inc. pg 409 of the text

See also